#INTRODUCTION In this guide we want to investigate the question: “How does the length of maternity leave influence the share of female managers in the Nordic Region?”. Our process of this investigation will work as a guide on how to compare and see correlations between to quantitative datasets from OECD.Stat.Following this guide you could choose investigate and visualize other data sets. Our guide includes 3 steps:

  1. FIRST DATASET: Scrape, tidy, code and visualize statistical data from the dataset “Employment: Length of maternity leave, parental leave, and paid father-specific leave”

  2. SECOND DATASET: Scrape, tidy, code and visualize statistical data from the dataset “Employment: Share of female managers”

  3. MERGE DATASETS: Merge the two datasets to see if there is a correlation between the length of maternity leave and the share of female managers in the Nordic Region

LOADING PACKAGES

To get started we need to load different packages that will be useful in our script and examination. Each package allows us to access the data in different ways.

library("rvest") #to scrape information from web pages
## Loading required package: xml2
library(dplyr) #to abstract over how the data is stored
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr) #to create tidy data
library(stringr) #to work with strings as easy as possible
library(janitor) #to examine and clean dirty data
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(tidyverse) #to collect some of the most versatile R packages
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ forcats 0.5.0
## ✓ readr   1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()         masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag()            masks stats::lag()
## x purrr::pluck()          masks rvest::pluck()
library(ggplot2) #to create visualizations and graphs

1: SCRAPING FIRST DATA FROM OECD.STAT

To investigate the datasets from OECD.Stat we will now scrape the website for data. I scrape the content of the website and extract the HTML table

url <- "https://stats.oecd.org/index.aspx?queryid=54760&fbclid=IwAR1Jt-NDjC_e4iBKlxrIpKSIBGtmsJYY1V2R6UmEIXT_sAelAk1_2S_YvRU#" 
# scrape the website. Here you insert the link of the data set of interest
url_html <- read_html(url)
We extract the whole HTML table through the

tag.

whole_table <- url_html %>% #we assign the data to a new object
 html_nodes("table") %>%
 html_table(fill = TRUE)  #str(whole_table) turns out to be a list

If you run a head() on the resulting whole_table, you will see that it is a list with unnamed elements marked by numbers in double brackets, such as [[1]].

new_table <- do.call(cbind,unlist(whole_table, recursive = FALSE)) 
## Warning in (function (..., deparse.level = 1) : number of rows of result is not
## a multiple of vector length (arg 33)
head(new_table) 
##      Indicator   Indicator   Indicator  
## [1,] "Age Group" "Age Group" "Age Group"
## [2,] "Unit"      "Unit"      "Unit"     
## [3,] "Sex"       "Sex"       "Sex"      
## [4,] "Time"      "Time"      "Time"     
## [5,] "Time"      "Time"      "Time"     
## [6,] "Time"      "Time"      "Time"     
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1990"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1991"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1992"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1993"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1994"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1995"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1996"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1997"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1998"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "1999"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2000"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2001"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2002"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2003"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2004"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2005"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2006"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2007"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2008"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2009"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2010"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2011"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2012"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2013"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2014"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2015"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      Length of maternity leaveLength of parental leave with job protectionTotal length of paid maternity and parental leaveLength of paid father-specific leave
## [1,] "Total"                                                                                                                                                   
## [2,] "Weeks"                                                                                                                                                   
## [3,] "Women"                                                                                                                                                   
## [4,] "2016"                                                                                                                                                    
## [5,] ""                                                                                                                                                        
## [6,] NA                                                                                                                                                        
##      X1                                                             X2
## [1,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
## [2,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
## [3,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
## [4,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
## [5,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
## [6,] "Data extracted on 07 Dec 2020 16:30 UTC (GMT) from OECD.Stat" NA
##      X1           X2           X3 X1           X2
## [1,] "Loading..." "Loading..." NA "Loading..." NA
## [2,] "Loading..." ""           NA "Loading..." NA
## [3,] "Loading..." "Loading..." NA "Loading..." NA
## [4,] "Loading..." ""           NA "Loading..." NA
## [5,] "Loading..." "Loading..." NA "Loading..." NA
## [6,] "Loading..." ""           NA "Loading..." NA

The line above takes the whole table from the website with all rows and column. This is very difficult to read. It does not look nice and it shows things that we are not interested in. Therefore we tidy the data in the following section of this guide.

Tidying the data

In our case we only want to look at the countries’ average of weeks of maternity leave through the time period (1990 to 2016). Therefore we filter the data as shown in the following lines

new_table <- as.data.frame(new_table) #create a new table
mleave <- new_table[8:44,2:30] #choose the rows and columns we want to look at 
mleave <- mleave[,c(1,3:29)] #choose the columns we want to both eliminate and look at
colnames(mleave) <- c("country", "1990", "1991", "1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016") #assign new column names to years

mleave <- as.data.frame(lapply(mleave, function(y) gsub("\\.\\.", NA, y))) #change the .. to NA, så R can read the data

In the table the numbers are set at characters. In the following I change these to numeric, so that R can make read the table correctly.

cols.num <- c("X1990", "X1991", "X1992", "X1993", "X1994", "X1995", "X1996", "X1997", "X1998", "X1999", "X2000", "X2001", "X2002", "X2003", "X2004", "X2005", "X2006", "X2007", "X2008", "X2009", "X2010", "X2011", "X2012", "X2013", "X2014", "X2015", "X2016") #columns from characters to numeric
mleave[cols.num] <- sapply(mleave[cols.num],as.numeric) #recode characters as numeric
sapply(mleave, class) #print classes of all colums
##     country       X1990       X1991       X1992       X1993       X1994 
## "character"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
##       X1995       X1996       X1997       X1998       X1999       X2000 
##   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
##       X2001       X2002       X2003       X2004       X2005       X2006 
##   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
##       X2007       X2008       X2009       X2010       X2011       X2012 
##   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
##       X2013       X2014       X2015       X2016 
##   "numeric"   "numeric"   "numeric"   "numeric"

In the table the countries are set to be characters. In the following I change these to factors.

mleave$country <- as.factor(mleave$country) #columns from characters to factor

In the following I restructure the table so it knows that every country has numbers from different years.

mleave_long <- mleave %>% gather(year, weeks_mleave, -country) #restructuring the table so it knows that every country has numbers from different years

Now the table is readable for R.

VISUALIZATIONS

Now that we have finished tidying the table from OECD.Stat, we can begin making visualizations of this independent data set. In order to do so we need a few more packages.

library(gifski) #to convert images to GIF animations
library(png) #to read, write and display the PNG format
library(scales) #to manipulate and polish visualizations
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggthemes) #to make nice visualizations
library(gganimate) #to make animated visualizations

There are several ways of visualizing data. You could e.g. look up line geoms, point geoms, bars geoms. One way of vizualizing data is the following.

mleave_long %>% 
  ggplot(aes(x = year, y = weeks_mleave))+
  geom_jitter(aes(color = country))
## Warning: Removed 170 rows containing missing values (geom_point).

Another way of visualizing the data can be done like this:

ggplot(mleave_long) +
  geom_line(mapping = aes(x = year, y = weeks_mleave, group = country, color = country))
## Warning: Removed 170 row(s) containing missing values (geom_path).

As we can see above, many years are included in the visualization. We want to make our visualizations more clear and specific by focusing on the countries in the Nordic Region: Denmark, Iceland, Sweden, Norway and Finland. We chose to compare these countries, as they have similiar scandinavian welfare models.

mnordic <-  mleave_long %>% filter(country %in% c("Denmark","Iceland", "Sweden","Norway", "Finland")) #assigning the nordic countries
mnordic$year <- str_remove(mnordic$year, "X") #filtering out the "X" that popped in to our model earlier

``

Now that we have created this object with the specific countries we will use it to visualize the differences within the countries. We will put it in the earlier ggplot-visualization and add a title.

ggplot(mnordic) +
  geom_line(mapping = aes(x = year, y = weeks_mleave, group = country, color = country)) + ggtitle("Length of Maternity Leave in Nordic Countries") +theme_solarized()+ scale_color_solarized()+
  xlab("Year")+ #naming the x axis
  ylab("Weeks") #naming the y axis

Here we used the theme solarized. You could choose among many different from this site: https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/

The visualization above shows the length of maternity leave. With this script it is possible to compare the different countries and to select specific countries you want to compare.

Furthermore, since the dataset which we want to compare this with only have data from 2011-2018, we filter the years so our new assigned value will include the countries 2011-2016. We assign this ‘mnselected’

mnselected <-  mleave_long %>% filter(country %in% c("Denmark","Iceland", "Sweden","Norway", "Finland")) %>%  filter(year %in% c("X2011", "X2012", "X2013", "X2014", "X2015", "X2016"))
mnselected$year <- str_remove(mnselected$year, "X") #filtering out the "X" that popped in to our model earlier
mnselected
##    country year weeks_mleave
## 1  Denmark 2011         18.0
## 2  Finland 2011         17.5
## 3  Iceland 2011         13.0
## 4   Norway 2011          9.0
## 5   Sweden 2011         15.6
## 6  Denmark 2012         18.0
## 7  Finland 2012         17.5
## 8  Iceland 2012         13.0
## 9   Norway 2012          9.0
## 10  Sweden 2012         15.6
## 11 Denmark 2013         18.0
## 12 Finland 2013         17.5
## 13 Iceland 2013         13.0
## 14  Norway 2013          9.0
## 15  Sweden 2013         15.6
## 16 Denmark 2014         18.0
## 17 Finland 2014         17.5
## 18 Iceland 2014         13.0
## 19  Norway 2014         17.0
## 20  Sweden 2014         15.6
## 21 Denmark 2015         18.0
## 22 Finland 2015         17.5
## 23 Iceland 2015         13.0
## 24  Norway 2015         13.0
## 25  Sweden 2015         15.6
## 26 Denmark 2016         18.0
## 27 Finland 2016         17.5
## 28 Iceland 2016         13.0
## 29  Norway 2016         13.0
## 30  Sweden 2016         19.9

2: SCRAPING SECOND DATA FROM OECD.STAT

Scraping Data

You can now choose another dataset that you would like to work with. Here we scrape the content of the website and extract the HTML table:

url <- "https://stats.oecd.org/index.aspx?queryid=96330"
# scrape the website
url_html <- read_html(url)
The following option let’s us extract the whole HTML table through the

tag.

whole_table <- url_html %>% 
 html_nodes("table") %>%
 html_table(fill = TRUE)  #str(whole_table) turns out to be a list

If you run a head() on the resulting whole_table, you will see that it is a list with unnamed elements marked by numbers in double brackets, such as [[1]].

new_table <- do.call(cbind,unlist(whole_table, recursive = FALSE)) 
head(new_table) 
##      Indicator   Indicator   Indicator   EMP10NEW: Share of female managers
## [1,] "Sex"       "Sex"       "Sex"       "Women"                           
## [2,] "Age Group" "Age Group" "Age Group" "Total"                           
## [3,] "Unit"      "Unit"      "Unit"      "Percentage"                      
## [4,] "Time"      "Time"      "Time"      "2011"                            
## [5,] "Time"      "Time"      "Time"      ""                                
## [6,] "Time"      "Time"      "Time"      NA                                
##      EMP10NEW: Share of female managers EMP10NEW: Share of female managers
## [1,] "Women"                            "Women"                           
## [2,] "Total"                            "Total"                           
## [3,] "Percentage"                       "Percentage"                      
## [4,] "2012"                             "2013"                            
## [5,] ""                                 ""                                
## [6,] NA                                 NA                                
##      EMP10NEW: Share of female managers EMP10NEW: Share of female managers
## [1,] "Women"                            "Women"                           
## [2,] "Total"                            "Total"                           
## [3,] "Percentage"                       "Percentage"                      
## [4,] "2014"                             "2015"                            
## [5,] ""                                 ""                                
## [6,] NA                                 NA                                
##      EMP10NEW: Share of female managers EMP10NEW: Share of female managers
## [1,] "Women"                            "Women"                           
## [2,] "Total"                            "Total"                           
## [3,] "Percentage"                       "Percentage"                      
## [4,] "2016"                             "2017"                            
## [5,] ""                                 ""                                
## [6,] NA                                 NA                                
##      EMP10NEW: Share of female managers
## [1,] "Women"                           
## [2,] "Total"                           
## [3,] "Percentage"                      
## [4,] "2018"                            
## [5,] ""                                
## [6,] NA                                
##      X1                                                             X2
## [1,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
## [2,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
## [3,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
## [4,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
## [5,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
## [6,] "Data extracted on 07 Dec 2020 17:05 UTC (GMT) from OECD.Stat" NA
##      X1           X2           X3 X1           X2
## [1,] "Loading..." "Loading..." NA "Loading..." NA
## [2,] "Loading..." ""           NA "Loading..." NA
## [3,] "Loading..." "Loading..." NA "Loading..." NA
## [4,] "Loading..." ""           NA "Loading..." NA
## [5,] "Loading..." "Loading..." NA "Loading..." NA
## [6,] "Loading..." ""           NA "Loading..." NA

The line above takes the whole table from the website with some rows and coloumns. It does not look nice and it shows thing that I am not interested in.

Tidying the data

In our case we only want to look at the countries’ share of female managers through the time period (2011 to 2018). Therefore we filter the data as shown in the following lines

fmanagers <- new_table[8:52,2:18] #choose the rows and columns we want to look at 
fmanagers <- fmanagers[,c(1,3:10)] #choose the columns we want to both eliminate and look at
colnames(fmanagers) <- c("country", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018") #assign new column names to years

I can see that the data is listed in characters (), and I want to convert some of it to numeric to be able to create visualizations (I got help from this website: https://stackoverflow.com/questions/22772279/converting-multiple-columns-from-character-to-numeric-format-in-r)

Now we convert our table to be ‘as tibble’. In that way the numbers will be converted into dataframes, which will help our further analysis

fmanagers <- as.tibble(fmanagers)
## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
cols.num <- c("2011","2012","2013","2014","2015","2016","2017","2018") # we tell r which colons should be numeric
fmanagers[cols.num] <- sapply(fmanagers[cols.num],as.numeric) # in this section we convert the colomns in our dataset to be numeric instead of characters

To be able to create the wanted visualizations, I will need to assign the different countries as factors so that I can distinguish between them. In the following I will ascribe the countries as factors:

fmanagers$country <- as.character(fmanagers$country)
sapply(fmanagers, class)
##     country        2011        2012        2013        2014        2015 
## "character"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric" 
##        2016        2017        2018 
##   "numeric"   "numeric"   "numeric"

In the following we restructure the table so that R can read it.

fmanagers_long <- fmanagers %>% gather(year, percentage, -country) # I want to tidy my data further so that there will be a specific column with the year assigned. Now I can hopefully start visualizing my data...

Now we have finished tidying the data and restructuring the table. Therefore we will now be able to move on with the visualizations

Visualizations of second data set

theme_set(theme_bw()) #set the theme to black and white for better visualization
ggplot(fmanagers_long) +
  geom_line(mapping = aes(x = year, y =percentage, group = country)) #by selecting the country and mapping that the visualization should include years, countries and percentage

Here we have all the years and all the countries listed. But it does not look very manageable, so we will try to pick out the countries we want to compare and illustrate them in different colors. The chosen countries are: Denmark, Iceland, Colombia, Italy, and the OECD average.

fmanagers_long %>% 
  ggplot(aes(x =  year, y = percentage))+
  geom_jitter(aes(color = country))

Another way of making visualizations is the following:

ggplot(fmanagers_long) +
  geom_line(mapping = aes(x =  year, y = percentage, group = country, color = country))

As there are so many countries in the data set, the vizualization is not very nice. Therefore we now want to select some countries. We choose five countries; Denmark, Iceland, Finland, Norway and Sweden to see how the share of female managers and the development in these countries look. Therefore we create a new object called ‘fnordic’.

fnordic <-  fmanagers_long %>% filter(country %in% c("Denmark","Iceland", "Sweden","Norway", "Finland")) #here you can chose the countries
fnordic
## # A tibble: 40 x 3
##    country year  percentage
##    <chr>   <chr>      <dbl>
##  1 Denmark 2011        27.6
##  2 Finland 2011        31.6
##  3 Iceland 2011        39.3
##  4 Norway  2011        32.9
##  5 Sweden  2011        34.3
##  6 Denmark 2012        28.2
##  7 Finland 2012        29.4
##  8 Iceland 2012        39.7
##  9 Norway  2012        33.6
## 10 Sweden  2012        35.3
## # … with 30 more rows

Now that we have created this object with the specific countries we will use it to visualize the differences within the countries. We will put it in the earlier ggplot-visualization and add a title.

ggplot(fnordic) +
  geom_line(mapping = aes(x =  year, y = percentage, group = country, color = country)) + ggtitle("Share of Female Managers in Nordic Countries")+ theme_solarized()+ scale_color_solarized()+ #add title
  xlab("Year")+ #to rename the x-axis
  ylab("Percentage")#to rename the y-axis

Here we used the theme solarized. You could choose among many different from this site: https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/

The visualization above shows the share of female managers. With this script it is possible to compare the different countries and to select specific countries you want to compare. As we can see above, the countries that Denmark normally compare it self with are doing a lot better than Denmark when it comes to the share of female managers. Also, Denmark (and Finland) is doing worse than the OECD average which includes the 45 OECD countries.

Furthermore, since the dataset which we want to compare this with have data from 1990-2016, we filter the years so our new assigned value will include the countries 2011-2016. We assign this ‘fnselected’

fnselected <-  fmanagers_long %>% filter(country %in% c("Denmark","Iceland", "Sweden","Norway", "Finland")) %>% filter(year %in% c("2011","2012", "2013","2014", "2015", "2016")) #here you can chose the countries and the years you want to focus on

Here we visualize our new assigned function, ‘fnselected’:

ggplot(fnselected) +
  geom_line(mapping = aes(x =  year, y = percentage, group = country, color = country)) + ggtitle("Share of Female Managers in Nordic Countries")+ theme_solarized()+ scale_color_solarized()+ #add title
  xlab("Year")+ #to rename the x-axis
  ylab("Percentage")#to rename the y-axis

To make it easier to follow, we will post the visualization showing the length of maternity leave in the Nordic Region, ‘mnselected’

ggplot(mnselected) +
  geom_line(mapping = aes(x = year, y = weeks_mleave, group = country, color = country)) + ggtitle("Length of Maternity Leave in Nordic Countries") +theme_solarized()+ scale_color_solarized()+ #add a title
  xlab("Year")+ #naming the x axis
  ylab("Weeks") #naming the y axis

Now, that we can see these two visualizations showing the development from 2011-2016 in the Nordic Region, we want to show how it is possible to merge the two datasets/visualizations!

MERGING THE DATASETS AND CREATING ANIMATION

We have worked with two data sets from OECD.Stat independently. In the following section we merge the two data sets to and create an animation. We ask the question: How does the length of maternity leave influence the share of female managers in the Nordic countries?

Firstly we put the data sets together. We do this by the function ‘merged’. You want to make sure that your two datasets have the same variables and rows to compare them.

merged <- merge(mnordic, fnordic, by=c("year", "country"))

Then we will try to make visualizations on this. At first, we make the comparisons within a certain year, for instance 2016.

# I will try to merge the two statistics with specifik years

merged2016 <- filter(merged, year == "2016") #we can compare the different countries on maternity leave and share of female managers in 2016

# Here, we will visualize the status in 2016 on the length of maternity leave and share of female managers in the Nordic Region. We call this animation 'anim1', but you could also call it something else

anim1 <- ggplot(merged2016,aes(x = weeks_mleave, y = percentage, group = country, color = country))+
  geom_point(size=8, alpha = 0.5)+
  xlim(10,24)+
  ylim(25,41)+
  xlab("Weeks")+
  ylab("Percentage")+
  ggtitle("MATERNITY LEAVES' INFLUENCE ON SHARE OF FEMALE MANAGERS")+
  geom_text(aes(label=country),hjust=0, vjust=2.5)+theme_solarized()+ scale_color_solarized()+
  theme(legend.position = "none")

anim1 

Now we want to make a moving visualization with changing years.

class(merged$year) #the class of year is character, so we need to change it
## [1] "character"
merged$year <- as.numeric(as.character(merged$year)) #we change 'year' to numeric, so the visualization will be able to move through the years as numbers

anim2 <-ggplot(merged, aes(x = weeks_mleave, y = percentage, group = country, color = country))  +
  geom_point(size=8, alpha = 0.4)+
  transition_time(year)+
   xlab("Weeks of maternity leave")+
  ylab("Percentage of female managers")+
  labs(title = "Year: {round(frame_time)}")+
  theme_solarized()
anim2 #we call this visualization anim2, but it could also have a different name

Now we have maked the data move! As we can see, Denmark is the country with the longest length of maternity leave and the lowest share of female managers. However it does not look at if there is a direct correlation between the length of maternity leave and the share of female managers.

If you want to use this animation in powerpoints or other kinds of presentations, it is a good idea to save it as a GIF. Below is a guide on that:

#Make it a GIF
animate(anim2, fps = 10, width = 750, height = 450) #we want to make it a different format

anim_save("gender.gif")

WRAP IT UP

Now, we have learned how to scrape, code, tidy and visualize data from OECD.Stat. Good job! We hop you can use this in your future research. If you want to contact us, our GitHubs are: “helenefonne” and “sofia-sif”.